53 research outputs found

    Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

    Full text link
    In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.Comment: 8 pages, 2 figures, 6 tables. Published in Journal of Biomedical Informatic

    Assessing in-season crop classification performance using satellite data: a test case in Northern Italy

    Get PDF
    AbstractThis study investigated the feasibility of delivering a crop type map early during the growing season. Landsat 8 OLI multi-temporal data acquired in 2013 season were used to classify seven crop types in Northern Italy. The accuracy achieved with four supervised algorithms, fed with multi-temporal spectral indices (EVI, NDFI, RGRI), was assessed as a function of the crop map delivery time during the season. Overall accuracy (Kappa) exceeds 85% (0.83) starting from mid-July, five months before the end of the season, when maximum accuracy is reached (OA=92%, Kappa=0.91). Among crop types, rice is the most accurately classified, followed by forages, maize and arboriculture, while soybean or double crops can be confused with other classes

    Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application

    Full text link
    The introduction of computerized medical records in hospitals has reduced burdensome operations like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting them from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation, using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Large Language Model for this task. Moreover, we conducted several experiments with three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "few-shot" approach. This allowed us to establish methodological guidelines that pave the way for future implementations in this field and allow Italian hospitals to tap into important research opportunities

    Study of“Shaken Baby Syndrome”: Morphological and Diffusion MRI Data

    Get PDF
    Shaken baby syndrome (SBS) is the most common cause of death related to child abuse; nonfatal consequences of SBS include varying degrees of visual, motor and cognitive impairment due to severe brain damage in almost 30% of infants with SBS. Brain damage occurs from the biomechanical forces, swelling, ischemia and altered vascular autoregulation and from additionally axonal damage[1].In the present study we want to examine a cohort of 7 patient affected by SBS and compare their data with controls choosen by same range of age, 19 months till 60. Using MRI techniques we define a new paradigm for demonstrating, through voxel based morphometry, deficiencies, connected to white and grey matter regions, in the prefrontal cortex and also in the hippocampus, amygdala, corpus callosum and optical radiation. Adding diffusion tensor imaging technique by constrained spherical deconvolution[2] our study put in evidence connectivity between investigated areas, suggesting neural network abnormalities. With this “state of art” studies we can show a correlation between childhood abuse and brain structures modification. Our aim is to make a longitudinal study on the anatomical data of these patients following their clinical evolution

    Downstream Services for Rice Crop Monitoring in Europe: From Regional to Local Scale

    Get PDF
    The ERMES agromonitoring system for rice cultivations integrates EO data at different resolutions, crop models, and user-provided in situ data in a unified system, which drives two operational downstream services for rice monitoring. The first is aimed at providing information concerning the behavior of the current season at regional/rice district scale, while the second is dedicated to provide farmers with field-scale data useful to support more efficient and environmentally friendly crop practices. In this contribution, we describe the main characteristics of the system, in terms of overall architecture, technological solutions adopted, characteristics of the developed products, and functionalities provided to end users. Peculiarities of the system reside in its ability to cope with the needs of different stakeholders within a common platform, and in a tight integration between EO data processing and information retrieval, crop modeling, in situ data collection, and information dissemination. The ERMES system has been operationally tested in three European rice-producing countries (Italy, Spain, and Greece) during growing seasons 2015 and 2016, providing a great amount of near-real-time information concerning rice crops. Highlights of significant results are provided, with particular focus on real-world applications of ERMES products and services. Although developed with focus on European rice cultivations, solutions implemented in the ERMES system can be, and are already being, adapted to other crops and/or areas of the world, thus making it a valuable testing bed for the development of advanced, integrated agricultural monitoring systems

    Pervasive gaps in Amazonian ecological research

    Get PDF
    Biodiversity loss is one of the main challenges of our time,1,2 and attempts to address it require a clear un derstanding of how ecological communities respond to environmental change across time and space.3,4 While the increasing availability of global databases on ecological communities has advanced our knowledge of biodiversity sensitivity to environmental changes,5–7 vast areas of the tropics remain understudied.8–11 In the American tropics, Amazonia stands out as the world’s most diverse rainforest and the primary source of Neotropical biodiversity,12 but it remains among the least known forests in America and is often underrepre sented in biodiversity databases.13–15 To worsen this situation, human-induced modifications16,17 may elim inate pieces of the Amazon’s biodiversity puzzle before we can use them to understand how ecological com munities are responding. To increase generalization and applicability of biodiversity knowledge,18,19 it is thus crucial to reduce biases in ecological research, particularly in regions projected to face the most pronounced environmental changes. We integrate ecological community metadata of 7,694 sampling sites for multiple or ganism groups in a machine learning model framework to map the research probability across the Brazilian Amazonia, while identifying the region’s vulnerability to environmental change. 15%–18% of the most ne glected areas in ecological research are expected to experience severe climate or land use changes by 2050. This means that unless we take immediate action, we will not be able to establish their current status, much less monitor how it is changing and what is being lostinfo:eu-repo/semantics/publishedVersio

    Pervasive gaps in Amazonian ecological research

    Get PDF

    Pervasive gaps in Amazonian ecological research

    Get PDF
    Biodiversity loss is one of the main challenges of our time,1,2 and attempts to address it require a clear understanding of how ecological communities respond to environmental change across time and space.3,4 While the increasing availability of global databases on ecological communities has advanced our knowledge of biodiversity sensitivity to environmental changes,5,6,7 vast areas of the tropics remain understudied.8,9,10,11 In the American tropics, Amazonia stands out as the world's most diverse rainforest and the primary source of Neotropical biodiversity,12 but it remains among the least known forests in America and is often underrepresented in biodiversity databases.13,14,15 To worsen this situation, human-induced modifications16,17 may eliminate pieces of the Amazon's biodiversity puzzle before we can use them to understand how ecological communities are responding. To increase generalization and applicability of biodiversity knowledge,18,19 it is thus crucial to reduce biases in ecological research, particularly in regions projected to face the most pronounced environmental changes. We integrate ecological community metadata of 7,694 sampling sites for multiple organism groups in a machine learning model framework to map the research probability across the Brazilian Amazonia, while identifying the region's vulnerability to environmental change. 15%–18% of the most neglected areas in ecological research are expected to experience severe climate or land use changes by 2050. This means that unless we take immediate action, we will not be able to establish their current status, much less monitor how it is changing and what is being lost
    • …
    corecore